Using homology relations within a database markedly boosts protein sequence similarity search.

نویسندگان

Jing Tong

Ruslan I Sadreyev

Jimin Pei

Lisa N Kinch

Nick V Grishin

چکیده

Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence-based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit's known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Similarity Search Using Pre-Search in UniRef100 Database

Sequence similarity in biological databases is used to characterize a newly discovered protein and confirming the existence of its homologs. This is often computationally very expensive. We have implemented a new algorithm that performs sequence similarity search using a pre-search phase. The proposed algorithm works in three phases. As a prepreparation for Pre-Search, we locate a sequence, sim...

متن کامل

Computational Identification of Micro RNAs and Their Transcript Target(s) in Field Mustard (Brassica rapa L.)

Background: Micro RNAs (miRNAs) are a pivotal part of non-protein-coding endogenous small RNA molecules that regulate the genes involved in plant growth and development, and respond to biotic and abiotic environmental stresses posttranscriptionally.Objective: In the present study, we report the results of a systemic search for identifi cation of new miRNAs in B. rapa using homology-based ...

متن کامل

HorA web server to infer homology between proteins using sequence and structural similarity

The biological properties of proteins are often gleaned through comparative analysis of evolutionary relatives. Although protein structure similarity search methods detect more distant homologs than purely sequence-based methods, structural resemblance can result from either homology (common ancestry) or analogy (similarity without common ancestry). While many existing web servers detect struct...

متن کامل

Crowdsourcing Protein Family Database Curation

We propose a novel method for crowdsourcing a protein family database. We discuss how we intend to identify novel groupings of proteins from user sequence similarity search, and how text mining will be applied to assist in annotation of these novel groupings, and more broadly as an enrichment of protein sequence similarity search results. We intend to use entity linking to identify literature w...

متن کامل

PFMFind: A System for Discovery of Peptide Homology and Function

Protein Fragment Motif Finder (PFMFind) is a system that enables e cient discovery of relationships between short fragments of protein sequences using similarity search. It supports queries based on amino acid similarity matrices and position specific score matrices (PSSMs) obtained through an iterative procedure. PSSM construction is customisable through plugins written in Python. PFMFind cons...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Proceedings of the National Academy of Sciences of the United States of America

دوره 112 22 شماره

صفحات -

تاریخ انتشار 2015

Using homology relations within a database markedly boosts protein sequence similarity search.

نویسندگان

چکیده

منابع مشابه

Similarity Search Using Pre-Search in UniRef100 Database

Computational Identification of Micro RNAs and Their Transcript Target(s) in Field Mustard (Brassica rapa L.)

HorA web server to infer homology between proteins using sequence and structural similarity

Crowdsourcing Protein Family Database Curation

PFMFind: A System for Discovery of Peptide Homology and Function

عنوان ژورنال:

اشتراک گذاری